Articles tagged "CSDN Blog"

Implementing a Simple Web Crawler with Python2

2018-04-10 277 views 其他 web crawler Python CSDN Blog

This project is a simple web crawler designed to scrape relevant content from CSDN blogs and save it as HTML files. It includes the basic process of a crawler: crawling, parsing, and storage. ### Crawling Process 1. **Scheduler (`spider_main.py`)**: - This is the entry point of the entire project. - It calls `HtmlOutputer` to output data, `Downloader` to download web page content, and `HtmlParser` to parse the downloaded content (parsing logic continues...).